tutorials/021 - Global Configurations.ipynb (607 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[](https://github.com/aws/aws-sdk-pandas)\n",
"\n",
"# 21 - Global Configurations\n",
"\n",
"[awswrangler](https://github.com/aws/aws-sdk-pandas) has two ways to set global configurations that will override the regular default arguments configured in functions signatures.\n",
"\n",
"- **Environment variables**\n",
"- **wr.config**\n",
"\n",
"*P.S. Check the [function API doc](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html) to see if your function has some argument that can be configured through Global configurations.*\n",
"\n",
"*P.P.S. One exception to the above mentioned rules is the `botocore_config` property. It cannot be set through environment variables\n",
"but only via `wr.config`. It will be used as the `botocore.config.Config` for all underlying `boto3` calls.\n",
"The default config is `botocore.config.Config(retries={\"max_attempts\": 5}, connect_timeout=10, max_pool_connections=10)`.\n",
"If you only want to change the retry behavior, you can use the environment variables `AWS_MAX_ATTEMPTS` and `AWS_RETRY_MODE`.\n",
"(see [Boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables))*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Environment Variables"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"env: WR_DATABASE=default\n",
"env: WR_CTAS_APPROACH=False\n",
"env: WR_MAX_CACHE_SECONDS=900\n",
"env: WR_MAX_CACHE_QUERY_INSPECTIONS=500\n",
"env: WR_MAX_REMOTE_CACHE_ENTRIES=50\n",
"env: WR_MAX_LOCAL_CACHE_ENTRIES=100\n"
]
}
],
"source": [
"%env WR_DATABASE=default\n",
"%env WR_CTAS_APPROACH=False\n",
"%env WR_MAX_CACHE_SECONDS=900\n",
"%env WR_MAX_CACHE_QUERY_INSPECTIONS=500\n",
"%env WR_MAX_REMOTE_CACHE_ENTRIES=50\n",
"%env WR_MAX_LOCAL_CACHE_ENTRIES=100"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import botocore\n",
"\n",
"import awswrangler as wr"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>foo</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" foo\n",
"0 1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wr.athena.read_sql_query(\"SELECT 1 AS FOO\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resetting"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Specific\n",
"wr.config.reset(\"database\")\n",
"# All\n",
"wr.config.reset()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## wr.config"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"wr.config.database = \"default\"\n",
"wr.config.ctas_approach = False\n",
"wr.config.max_cache_seconds = 900\n",
"wr.config.max_cache_query_inspections = 500\n",
"wr.config.max_remote_cache_entries = 50\n",
"wr.config.max_local_cache_entries = 100\n",
"# Set botocore.config.Config that will be used for all boto3 calls\n",
"wr.config.botocore_config = botocore.config.Config(\n",
" retries={\"max_attempts\": 10}, connect_timeout=20, max_pool_connections=20\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>foo</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" foo\n",
"0 1"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wr.athena.read_sql_query(\"SELECT 1 AS FOO\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualizing"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>Env. Variable</th>\n",
" <th>type</th>\n",
" <th>nullable</th>\n",
" <th>enforced</th>\n",
" <th>configured</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>catalog_id</td>\n",
" <td>WR_CATALOG_ID</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>concurrent_partitioning</td>\n",
" <td>WR_CONCURRENT_PARTITIONING</td>\n",
" <td><class 'bool'></td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ctas_approach</td>\n",
" <td>WR_CTAS_APPROACH</td>\n",
" <td><class 'bool'></td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>database</td>\n",
" <td>WR_DATABASE</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>default</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>max_cache_query_inspections</td>\n",
" <td>WR_MAX_CACHE_QUERY_INSPECTIONS</td>\n",
" <td><class 'int'></td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>max_cache_seconds</td>\n",
" <td>WR_MAX_CACHE_SECONDS</td>\n",
" <td><class 'int'></td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>max_remote_cache_entries</td>\n",
" <td>WR_MAX_REMOTE_CACHE_ENTRIES</td>\n",
" <td><class 'int'></td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>max_local_cache_entries</td>\n",
" <td>WR_MAX_LOCAL_CACHE_ENTRIES</td>\n",
" <td><class 'int'></td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>s3_block_size</td>\n",
" <td>WR_S3_BLOCK_SIZE</td>\n",
" <td><class 'int'></td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>workgroup</td>\n",
" <td>WR_WORKGROUP</td>\n",
" <td><class 'str'></td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>chunksize</td>\n",
" <td>WR_CHUNKSIZE</td>\n",
" <td><class 'int'></td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>s3_endpoint_url</td>\n",
" <td>WR_S3_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>athena_endpoint_url</td>\n",
" <td>WR_ATHENA_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>sts_endpoint_url</td>\n",
" <td>WR_STS_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>glue_endpoint_url</td>\n",
" <td>WR_GLUE_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>redshift_endpoint_url</td>\n",
" <td>WR_REDSHIFT_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>kms_endpoint_url</td>\n",
" <td>WR_KMS_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>emr_endpoint_url</td>\n",
" <td>WR_EMR_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>dynamodb_endpoint_url</td>\n",
" <td>WR_DYNAMODB_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>secretsmanager_endpoint_url</td>\n",
" <td>WR_SECRETSMANAGER_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>timestream_endpoint_url</td>\n",
" <td>WR_TIMESTREAM_ENDPOINT_URL</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>botocore_config</td>\n",
" <td>WR_BOTOCORE_CONFIG</td>\n",
" <td><class 'botocore.config.Config'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td><botocore.config.Config object at 0x14f313e50></td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>verify</td>\n",
" <td>WR_VERIFY</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>address</td>\n",
" <td>WR_ADDRESS</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>redis_password</td>\n",
" <td>WR_REDIS_PASSWORD</td>\n",
" <td><class 'str'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>ignore_reinit_error</td>\n",
" <td>WR_IGNORE_REINIT_ERROR</td>\n",
" <td><class 'bool'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>include_dashboard</td>\n",
" <td>WR_INCLUDE_DASHBOARD</td>\n",
" <td><class 'bool'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>log_to_driver</td>\n",
" <td>WR_LOG_TO_DRIVER</td>\n",
" <td><class 'bool'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>object_store_memory</td>\n",
" <td>WR_OBJECT_STORE_MEMORY</td>\n",
" <td><class 'int'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>cpu_count</td>\n",
" <td>WR_CPU_COUNT</td>\n",
" <td><class 'int'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>gpu_count</td>\n",
" <td>WR_GPU_COUNT</td>\n",
" <td><class 'int'></td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<awswrangler._config._Config at 0x1376ece80>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wr.config"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "bd595004b250e5f4145a0d632609b0d8f97d1ccd278d58fafd6840c0467021f9"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}